Skip to main content

All Questions

1vote
1answer
87views

Deep RL problem: Loss decreases but agent doesn't learn

I'm implementing a basic Vanilla Policy Gradient algorithm for the CartPole-v1 gymnasium environment, and I don't know what I'm doing wrong. No matter what I try, during the training loop the loss ...
wildBass's user avatar
1vote
1answer
603views

What is the problem in my implementation of actor critic?

I have been implementing both REINFORCE with baseline and actor-critic to solve "cartpole-v1". As a reminder, here is the presentation of the algorithms in Sutton and Barto's book (http://...
Labo's user avatar
  • 121
1vote
1answer
2kviews

DDPG doesn't converge for MountainCarContinuous-v0 gym environment

I am trying to implement Deep Deterministic policy gradient algorithm by referring to the paper Continuous Control using Deep Reinforcement Learning on the MountainCarContinuous-v0 gym environment. I ...
Vedant Shah's user avatar
1vote
0answers
391views

Subtracting the entropy from our policy gradient will prevent our agent from being stuck in the local minimum?

In the information theory, the entropy is a measure of uncertainty in some system. Being applied to agent policy, entropy shows how much the agent is uncertain about which action to make. In math ...
jgauth's user avatar
3votes
0answers
44views

Reinforcement Learning on quantum circuit

I am trying to teach an agent to make any random 1-qubit state reach uniform superposition. So basically, the full circuit will be ...
Sarvagya Gupta's user avatar

close